61 |
An automatically built named entity lexicon for Arabic
|
|
|
|
In: Attia, Mohammed, Toral, Antonio orcid:0000-0003-2357-2960 , Tounsi, Lamia, Monachini, Monica and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) An automatically built named entity lexicon for Arabic. In: LREC 2010 - 7th conference on International Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta. (2010)
|
|
Abstract:
We have successfully adapted and extended the automatic Multilingual, Interoperable Named Entity Lexicon approach to Arabic, using Arabic WordNet (AWN) and Arabic Wikipedia (AWK). First, we extract AWN’s instantiable nouns and identify the corresponding categories and hyponym subcategories in AWK. Then, we exploit Wikipedia inter-lingual links to locate correspondences between articles in ten different languages in order to identify Named Entities (NEs). We apply keyword search on AWK abstracts to provide for Arabic articles that do not have a correspondence in any of the other languages. In addition, we perform a post-processing step to fetch further NEs from AWK not reachable through AWN. Finally, we investigate diacritization using matching with geonames databases, MADA-TOKAN tools and different heuristics for restoring vowel marks of Arabic NEs. Using this methodology, we have extracted approximately 45,000 Arabic NEs and built, to the best of our knowledge, the largest, most mature and well-structured Arabic NE lexical resource to date. We have stored and organised this lexicon following the Lexical Markup Framework (LMF) ISO standard. We conduct a quantitative and qualitative evaluation of the lexicon against a manually annotated gold standard and achieve precision scores from 95.83% (with 66.13% recall) to 99.31% (with 61.45% recall) according to different values of a threshold.
|
|
Keyword:
Machine translating
|
|
URL: http://doras.dcu.ie/15979/
|
|
BASE
|
|
Hide details
|
|
62 |
Seeding statistical machine translation with translation memory output through tree-based structural alignment
|
|
|
|
In: Zhechev, Ventsislav and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Seeding statistical machine translation with translation memory output through tree-based structural alignment. In: SSST-4 - 4th Workshop on Syntax and Structure in Statistical Translation, 28 August 2010, Beijing, China. (2010)
|
|
BASE
|
|
Show details
|
|
63 |
Arabic parsing using grammar transforms
|
|
|
|
In: Tounsi, Lamia and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Arabic parsing using grammar transforms. In: LREC 2010 - 7th conference on International Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta. (2010)
|
|
BASE
|
|
Show details
|
|
64 |
LFG without C-structures
|
|
|
|
In: Cetinoglu, Ozlem, Foster, Jennifer orcid:0000-0002-7789-4853 , Nivre, Joakim, Hogan, Deirdre, Cahill, Aoife orcid:0000-0002-3519-7726 and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) LFG without C-structures. In: the 9th International Workshop on Treebanks and Linguistic Theories, 3 - 4 Dec. 2010, Tartu Estonia. (2010)
|
|
BASE
|
|
Show details
|
|
65 |
Closing the gap between stochastic and rule-based LFG grammars
|
|
|
|
In: Hautli, Annette, Cetinoglu, Ozlem and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Closing the gap between stochastic and rule-based LFG grammars. In: the LFG10 Conference, 18-20 July 2010, Ottowa, Canada. (2010)
|
|
BASE
|
|
Show details
|
|
66 |
Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French
|
|
|
|
In: Seddah, Djamé, Chrupała, Grzegorz, Cetinoglu, Ozlem, van Genabith, Josef orcid:0000-0003-1322-7944 and Candito, Marie (2010) Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French. In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA. (2010)
|
|
BASE
|
|
Show details
|
|
67 |
Deep syntax language models and statistical machine translation
|
|
|
|
In: Graham, Yvette and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Deep syntax language models and statistical machine translation. In: SSST-4 - 4th Workshop on Syntax and Structure in Statistical Translation at COLING 2010, 28 August 2010, Beijing, China. (2010)
|
|
BASE
|
|
Show details
|
|
68 |
Treebank-based automatic acquisition of wide coverage, deep linguistic resources for Japanese
|
|
Oya, Masanori. - : Dublin City University. National Centre for Language Technology (NCLT), 2010. : Dublin City University. School of Computing, 2010
|
|
In: Oya, Masanori (2010) Treebank-based automatic acquisition of wide coverage, deep linguistic resources for Japanese. Master of Science thesis, Dublin City University. (2010)
|
|
BASE
|
|
Show details
|
|
69 |
Automatic extraction of Arabic multiword expressions
|
|
|
|
In: Attia, Mohammed, Tounsi, Lamia, Pecina, Pavel, van Genabith, Josef orcid:0000-0003-1322-7944 and Toral, Antonio (2010) Automatic extraction of Arabic multiword expressions. In: the 7th Conference on Language Resources and Evaluation (LREC 2010)., May 2010., Valletta (Malta). (2010)
|
|
BASE
|
|
Show details
|
|
70 |
Two approaches to automatic matching of atomic grammatical features in LFG
|
|
|
|
In: Bryl, Anton and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Two approaches to automatic matching of atomic grammatical features in LFG. In: LFG10 Conference, 18-20 July 2010, Ottowa, Canada. (2010)
|
|
BASE
|
|
Show details
|
|
71 |
Partial dependency parsing for Irish
|
|
|
|
In: Uí Dhonnchadha, Elaine and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Partial dependency parsing for Irish. In: LREC2010: Language Resources and Evaluation Conference, 17-23 May 2010, Malta. (2010)
|
|
BASE
|
|
Show details
|
|
72 |
Combining multi-domain statistical machine translation models using automatic classifiers
|
|
|
|
In: Banerjee, Pratyush, Du, Jinhua orcid:0000-0002-3267-4881 , Li, Baoli, Kumar Naskar, Sudip, Way, Andy orcid:0000-0001-5736-5930 and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Combining multi-domain statistical machine translation models using automatic classifiers. In: AMTA 2010 - 9th Conference of the Association for Machine Translation in the Americas, 31 October - 4 November 2010, Denver, CO, USA. (2010)
|
|
BASE
|
|
Show details
|
|
73 |
Integrating N-best SMT outputs into a TM system
|
|
|
|
In: He, Yifan, Ma, Yanjun, Way, Andy orcid:0000-0001-5736-5930 and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Integrating N-best SMT outputs into a TM system. In: COLING 2010 - 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing, China. (2010)
|
|
BASE
|
|
Show details
|
|
74 |
Bridging SMT and TM with translation recommendation
|
|
|
|
In: He, Yifan, Ma, Yanjun, van Genabith, Josef orcid:0000-0003-1322-7944 and Way, Andy orcid:0000-0001-5736-5930 (2010) Bridging SMT and TM with translation recommendation. In: ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, 11-16 July 2010, Uppsala, Sweden. (2010)
|
|
BASE
|
|
Show details
|
|
75 |
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French
|
|
|
|
In: Attia, Mohammed, Foster, Jennifer orcid:0000-0002-7789-4853 , Hogan, Deirdre, Le Roux, Joseph, Tounsi, Lamia and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Handling unknown words in statistical latent-variable parsing models for Arabic, English and French. In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA. (2010)
|
|
BASE
|
|
Show details
|
|
76 |
Finding common ground: towards a surface realisation shared task
|
|
|
|
In: Belz, Anya, White, Mike, van Genabith, Josef orcid:0000-0003-1322-7944 , Hogan, Deirdre and Stent, Amanda (2010) Finding common ground: towards a surface realisation shared task. In: INLG 2010 - 6th International Natural Language Generation Conference, 7-9 July 2010, Trim, Co. Meath, Ireland. (2010)
|
|
BASE
|
|
Show details
|
|
77 |
Hard constraints for grammatical function labelling
|
|
|
|
In: Seeker, Wolfgang, Rehbein, Ines, Kuhn, Jonas and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Hard constraints for grammatical function labelling. In: ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, 11-16 July 2010, Uppsala, Sweden. (2010)
|
|
BASE
|
|
Show details
|
|
78 |
The DCU dependency-based metric in WMT-MetricsMATR 2010
|
|
|
|
In: He, Yifan, Du, Jinhua orcid:0000-0002-3267-4881 , Way, Andy orcid:0000-0001-5736-5930 and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) The DCU dependency-based metric in WMT-MetricsMATR 2010. In: WMT 2010 - Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, ACL 2010., 15-16 July, Uppsala, Sweden. (2010)
|
|
BASE
|
|
Show details
|
|
79 |
Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French
|
|
|
|
In: Proceedings of the First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010) ; First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010) ; https://hal.archives-ouvertes.fr/hal-00702414 ; First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), 2010, United States. pp.67-75 (2010)
|
|
BASE
|
|
Show details
|
|
80 |
Guessing the grammatical function of a non-root f-structure in LFG
|
|
|
|
In: Bryl, Anton, van Genabith, Josef and Graham, Yvette orcid:0000-0003-1322-7944 (2009) Guessing the grammatical function of a non-root f-structure in LFG. In: IWPT 2009 - 11th International Conference on Parsing Technologies, 7-9 October 2009, Paris, France. (2009)
|
|
BASE
|
|
Show details
|
|
|
|